Skip to content

Conversation

@pcastonguay
Copy link
Collaborator

@pcastonguay pcastonguay commented May 28, 2025

[nvbug 5294316] fix: Fix queued request stats

Description

Fixed per-request stats for queued requests

Test Coverage

Add unit test to cover

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@pcastonguay pcastonguay requested a review from a team as a code owner May 28, 2025 01:38
@pcastonguay pcastonguay requested a review from lfr-0531 May 28, 2025 01:38
@pcastonguay pcastonguay changed the title fix: Fix queued request stats [nvbug 5294316] fix: Fix queued request stats May 28, 2025
@pcastonguay
Copy link
Collaborator Author

/bot run

@pcastonguay pcastonguay requested a review from SimengLiu-nv May 28, 2025 01:40
@tensorrt-cicd
Copy link
Collaborator

PR_Github #6686 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6686 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4883 completed with status: 'FAILURE'

@pcastonguay pcastonguay force-pushed the fix_queued_req_stats branch from 362d8c8 to 092e5a7 Compare May 29, 2025 00:11
@pcastonguay
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6820 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6820 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4966 completed with status: 'FAILURE'

@pcastonguay pcastonguay force-pushed the fix_queued_req_stats branch from 092e5a7 to 1734ade Compare May 29, 2025 15:11
@pcastonguay
Copy link
Collaborator Author

/bot run --disable-fail-fast

1 similar comment
@SimengLiu-nv
Copy link
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6949 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6949 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5045 completed with status: 'FAILURE'

@pcastonguay
Copy link
Collaborator Author

/bot run --disable-fail-fast

@pcastonguay pcastonguay requested a review from schetlur-nv May 30, 2025 02:33
@pcastonguay pcastonguay force-pushed the fix_queued_req_stats branch from 1734ade to 8d42691 Compare May 30, 2025 02:34
@pcastonguay
Copy link
Collaborator Author

/bot kill

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6996 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6998 [ kill ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6996 [ run ] completed with state ABORTED

@pcastonguay
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6998 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit 8d42691

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6999 [ run ] triggered by Bot

@ixlmar
Copy link
Collaborator

ixlmar commented May 30, 2025

Note #4734. Meanwhile, I spotted a related test, but could not immediately figure out why that had not been failing previously.

@pcastonguay pcastonguay force-pushed the fix_queued_req_stats branch from 8d42691 to 8bfc803 Compare May 30, 2025 18:30
@pcastonguay
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7101 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #6999 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7101 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5134 completed with status: 'FAILURE'

@pcastonguay pcastonguay force-pushed the fix_queued_req_stats branch from 8bfc803 to 347ca9f Compare June 2, 2025 12:22
@pcastonguay
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7212 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7212 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5221 completed with status: 'SUCCESS'

@pcastonguay pcastonguay force-pushed the fix_queued_req_stats branch from 347ca9f to 30adfcb Compare June 2, 2025 23:35
@pcastonguay
Copy link
Collaborator Author

/bot reuse-pipeline

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7251 [ reuse-pipeline ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7251 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #7212 for commit 30adfcb

Signed-off-by: Patrice Castonguay <[email protected]>
Signed-off-by: Patrice Castonguay <[email protected]>
@pcastonguay pcastonguay force-pushed the fix_queued_req_stats branch from 30adfcb to 1d2e157 Compare June 3, 2025 12:06
@pcastonguay
Copy link
Collaborator Author

/bot reuse-pipeline

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7345 [ reuse-pipeline ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7345 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #7212 for commit 1d2e157

@pcastonguay pcastonguay merged commit 01f29ce into NVIDIA:main Jun 3, 2025
3 checks passed
darraghdog pushed a commit to darraghdog/TensorRT-LLM that referenced this pull request Jun 3, 2025
Signed-off-by: Patrice Castonguay <[email protected]>
Signed-off-by: darraghdog <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants